# Analysis and Visualization Pipeline: Graduate Migration & Earnings Bias

This directory contains the Stata code used to conduct statistical analysis, run simulations, and generate the figures and tables presented in the manuscript. These scripts utilize the disclosed datasets generated by the SAS processing pipeline.

## 1. Directory Structure & Script Descriptions

### Empirical Analysis & Figure Generation
* **`topschool_bias_graphs.do`**: The primary script for analyzing bias in "Top School" (Barron's Tier 1/Flagship) earnings. It generates scatter plots of betas by state and produces the regression results for Table 4 (regressing cell-level bias on missingness).
* **`bias_bymajor_graphs.do`**: Analyzes and visualizes earnings bias disaggregated by academic major (CIP codes). Produces horizontal bar charts comparing bias across different fields of study.
* **`share_se_graphs.do`**: Generates longitudinal graphs showing the share of graduates with national vs. in-state earnings, including a specific focus on self-employment (SE) for both Bachelors and Associates degree holders.
* **`institution_characteristics.do`**: Assembles institutional metadata (SAT scores, admission rates, etc.) from the College Scorecard. It computes differences between "Top" and "Non-Top" schools and exports characteristic tables to Excel.
* **`leave_one_out.do`**: Performs a sensitivity analysis (Jackknife-style) by systematically omitting one state at a time to ensure that the observed bias patterns are not driven by a single outlier state.

### Simulation Model
* **`lehd_college_migration_v5.do`**: A comprehensive simulation script that models labor force participation, migration decisions, and earnings shocks. It includes modules for:
    * Baseline stylized models.
    * Exogenous vs. endogenous migration loops.
    * Bounding exercises under various monotonicity assumptions.



---

## 2. Requirements and Setup

* **Software**: Stata 16 or higher.
* **Configuration**: Most scripts reference a `config.do` file (not included in this specific directory) to set global macros for file paths. Ensure your global paths (`$datadir`, `$graphdir`, `$ancillarydir`) are correctly defined before running.
* **External Data**:
    * **Disclosed Output**: CSV files containing aggregated beta estimates (e.g., `bias_scatter_barrons_byyear_2023-09-08.csv`).
    * **Ancillary Data**: College Scorecard records (`MERGED2006_07_PP.csv`) and migration-specific institution files.

---

## 3. Usage Instructions

1.  **Environment Setup**: Open `config.do` and update the file paths to match your local directory structure.
2.  **Institutional Context**: Run `institution_characteristics.do` to prepare the school-level descriptive statistics.
3.  **Main Analysis**: 
    * Run `topschool_bias_graphs.do` for flagship-level results.
    * Run `bias_bymajor_graphs.do` for major-specific results.
4.  **Robustness**: Execute `leave_one_out.do` to verify the stability of the results across the state sample.
5.  **Simulation**: Run `lehd_college_migration_v5.do` to reproduce the stylized model results and bounding exercises.

---

## 4. Output Summary

* **Tables**: `topschool_table.xlsx`, `bias_regs.dta`.
* **Figures**: 
    * `betas_by_state_scatter_[date].png`
    * `share_se_ypg_bacc.png` (and variants for Associates).
    * `bias_bymajor` bar charts.
* **Logs**: Each script is configured to output a `.txt` or `.log` file with the same name as the script for audit purposes.